A four layer sharing HMM system for very large vocabulary isolated word recognition

نویسندگان

Ruxin Chen

Miyuki Tanaka

Duanpei Wu

Lex Olorenshaw

Mariscela Amador

چکیده

This paper reports on a large vocabulary speaker independent isolated word recognizer targeting 50,000 words. The system supports a unique four-layer sharing structure for either continuous HMM or discrete HMM. Evaluation is performed using a dictionary of 5000 US city names, a dictionary of the 5000 English most frequent words, a dictionary of 50,000 English words, and the 110,000 word CMU English dictionary. For these dictionaries, recognition accuracy ranges from 90% to 93% for the top 3 results. The speech signal is a one-dimensional waveform as shown in FIGURE 1. The speech signal may be labeled with a sequence of phonemes. A word may correspond to one or more continuous phonemes. An example of the pho-neme labels of the isolated word " item " is also shown in FIGURE 1. FIGURE 1 : Speech waveform and phonemes of the isolated word " item ". A left-to-right HMM process is used to model the speech waveform in this Speech Recognition Engine (SRE) as shown in FIGURE 2. This figure displays a simple 3-state left-to-right HMM of a phoneme where the context includes the left and the right phone, i.e., the HMM is context dependent. This type of HMM is chosen because it offers convenient flexibility for state sharing between the first, second, and last state of the HMM, as explained below. A series of HMMs correspond to a series of pho-nemes. The observations are emitted from each state of the HMM process. The observation probabilities can be formulated as probability distribution , where refers to a state in a HMM. Each state transition, shown as an arc in FIGURE 2, is associated with a state transition probability which denotes the probability of transitioning using the arc of state. FIGURE 2 : A three state left-to-right HMM used in SRE for all the phonemes in a particular context. Suppose there are types of observations, and denotes the distribution of state and type (or stream) , then (1) The observations can be handled in two ways to create either a discrete observation HMM (DHMM) or a continuous observation HMM (CHMM). The probability distribution of the discrete observation HMM defined as (2) is a one-dimension array with each scalar denoting the probability of observing the vector quantized symbol for state , and denoting the sub-probability distribution that is a component in. in the equation denotes the total number of. The sub-probability distribution is introduced …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL

Sign language recognition has spawned more and more interest in human–computer interaction society. The major challenge that SLR recognition faces now is developing methods that will scale well with increasing vocabulary size with a limited set of training data for the signer independent application. The automatic SLR based on hidden Markov models (HMMs) is very sensitive to gesture's shape inf...

متن کامل

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

We present a novel large vocabulary OCR system, which implements a 5 confidenceand margin-based discriminative training approach for model adap6 tation of an HMM based recognition system to handle multiple fonts, different 7 handwriting styles, and their variations. Most current HMM approaches are HTK 8 based systems which are maximum-likelihood (ML) trained and which try to adapt 9 their model...

متن کامل

Development of HMM/Neural Network-Based Medium-Vocabulary Isolated-Word Lithuanian Speech Recognition System

The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three-layer neural network (a multi-layer perceptron) is trained by conventional stochastic backpropagation algorithm to esti...

متن کامل

Nonreciprocal data sharing in estimating HMM parameters

Parameter tying is often used in large vocabulary continuous speech recognition (LVCSR) systems to balance the model resolution and generalizability. However, one consequence of tying is that the differences among tied constructs are ignored. Parameter tying can be alternatively viewed as reciprocal data sharing in that a tied construct uses data associated with all others in its tiedclass. To ...

متن کامل

Confidence measures for hybrid HMM/ANN speech recognition

In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

A four layer sharing HMM system for very large vocabulary isolated word recognition

نویسندگان

چکیده

منابع مشابه

MAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

Development of HMM/Neural Network-Based Medium-Vocabulary Isolated-Word Lithuanian Speech Recognition System

Nonreciprocal data sharing in estimating HMM parameters

Confidence measures for hybrid HMM/ANN speech recognition

عنوان ژورنال:

اشتراک گذاری